Skip to content

Comments

Big science related changes#1407

Merged
jeffra merged 8 commits intomasterfrom
big-science-v2
Sep 29, 2021
Merged

Big science related changes#1407
jeffra merged 8 commits intomasterfrom
big-science-v2

Conversation

@jeffra
Copy link
Collaborator

@jeffra jeffra commented Sep 28, 2021

This adds several changes to master that were only originally in the big-science branch. The intent is to bring the big science project over to deepspeed master for development and testing and then they can freeze their deepspeed version at a later date.

A few significant changes this brings:
(1) pipeline parallelism support for NCCL send/recv
(2) support for newer megatron, e.g., https://github.com/microsoft/megatron-deepspeed
(3) support for querying grad norm from the engine

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Shaden Smith <shaden.smith@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
@jeffra jeffra mentioned this pull request Sep 28, 2021
1. GPT2ModelPipe->MockGPT2ModelPipe, due to hacks associated with GPT2ModelPipe name
2. Hardcode attn mask in mock gpt model pipe, newer PP requires stashing attn mask to get around issues with bool dtypes.
@jeffra jeffra enabled auto-merge (squash) September 29, 2021 22:33
@jeffra jeffra merged commit e2fdd25 into master Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant